





Basics and Chipsets 


staging a DIY Stand-Alone MP3 Player 


By Prof. F. P Volpe and P Elsesser 








er 
ee 
ae 


= 


Traditionally, an MP3 
player requires a semi- N 
conductor memory to 
store MP3 data. MP3 players 
capable of reading data 
directly from a CD and with- 
out ‘help’ from a PC are few and far between. That may change, however, 
with the home-brew MP3 player described in this short series of articles. 


20 Elektor Electronics 5/2000 


o ë ENERANTSBTI 


Amplitude ———>>— 


n Strong Tone Signal 


Region where 
Weaker Signals 


A are Masked 





Frequency ——>>— 


000044 - 11 


Figure |. A frequency range is masked (obliterated) by a loud signal. 


Music programme material dis- 
guised as MP3 files have become 
big business on the Internet. How- 
ever, before you can actually play a 
music-CD with your own selection 
of MP3 titles you typically have to 
provide a link between your stereo 
and your PC, by way of the sound- 
card. Arguably, a stand-alone MP3 
player with an internal standard 
CD-ROM drive and a hardware MP3 
decoder, ready for connection to 
your stereo, represents a much bet- 
ter solution because it avoids the 
tedious process of having to ‘boot’ 
your PC every time you want to 
play some music. Also, the often 
objectionable noise level added by 


120 


— 
© 
So 


Sound pressure level (SPL) 
(Reference: 20uPa) 


40 


20 


Frequency (Hz) 





the PC, and its very presence in the 
living room, should be considered. 

Such an MP3 player will be 
described in the July/August 2000 
issue of Elektor Electronics, and the 
present article is intended as an 
introduction to it. This month we 
will describe the outlines and basics 
of MP3 compression, as well as 
some frequently seen chip sets for 
MP3 decoding. We will also gaze 
into the crystal ball for a bit in rela- 
tion to future audio compression 
technologies. 


The follow-up article, to be pub- 


lished in our annual Summer Cir- 
cuits issue (i.e., July/August 2000), 


7 
120 phon (loudness level) 


1000 10,000 
000044 - 12 


Figure 2. The human ear is particularly sensitive to speech signals. 


5/2000 Elektor Electronics 


will cover ‘hands-on’ matters like the con- 
struction of the MP3 decoder board, the 
ATAPI support by a microcontroller and the 
link to a CD-ROM drive. 


Why compression? 


A digital audio signal typically consists of 
samples with a size of 16 bits. According to 
the sampling theorem, at a given bandwidth, 
a sample has to be taken at a rate that equals 
at least twice the frequency of the pro- 
gramme material. If CD quality is required in 
respect of bandwidth (44.1 kHz), then a data 
rate of about 1.35 Mbit/s is needed to convey 
an audio signal. In other words, one minute 
of music requires about 10 Mbytes of data to 
be conveyed or stored on a data carrier. 
Unfortunately, that is not practicable even 
with the huge capacity of today’s hard disks. 
Obviously, the resultant transmission 
(=download) times using media like the 
Internet, Internet-Radio or Music On Demand 
systems would be very long and therefore 
prohibitive. In this case, the only solution is 
to find a way to reduce the immense size of 
the relevant data, the ‘real-world’ aim being 
to convey a stereo music signal across an 
ISDN link with a capacity of 64-kbit/s per 
channel. In view of the required compression 
rate of 1:12, that would seem to be possible 
only if losses are accepted. Over a relatively 
short period, a system called MPEG-1 Layer 3 
has established itself as the de facto standard 
for audio transmission via the Internet. MP3 
employs compression algorithms that take 
the real-life response of the human ear into 
account. The resultant quality of reproduced 
sound is so good that even trained experts 
are unable to hear the difference between a 
copy and its original. MP3 is also clearly 
superior as compared to other MPEG Layers 
such as the Digital Compact Cassette (MP1), 
Digital Audio Broadcasting, the Video-CD 
(MP2) and simpler systems like CELP u-Law 
or ADPCM. 


The essential point about MP3 is that the 
system is based on a psycho-acoustic model 
whose elementary structure will be dis- 
cussed below. A far more extensive tutorial 
about MPEG and audio compression tech- 
niques is available in the form of a down- 
loadable file (mpeg-tutorial.pdf) from the 
Download Area on the Elektor Electronics 
website at 

http://www.elektor-electronics.co.uk 
(document reproduced courtesy IEEE). 

Fortunately, this document does not rely on 
higher mathematics to explain the underlying 
principles! 


21 


GENERALINTERES| 


Psycho-acoustics 


The science of Psycho-Acoustics studies the 
behaviour of the human hearing system in 
relation to processing of acoustic information 
in the brain. Psycho-acoustic principles have 
been extensively used in the development of 
MP3 and indeed many other compression 
techniques. 

The audible spectrum may be thought of 
as consisting of 26 frequency bands. 

The frequency range below 500 Hz is sub- 
divided into five bands of 100 Hz each. 
Above this range, the bandwidth is about 
1/5th of the centre frequency. In the human 
hearing system, soft sounds become less dis- 
tinct and even inaudible when loud, discrete 
sound levels occur within these ‘critical’ 
bands. As an aside, you should note that the 
frequency resolving capacity of the human 
hearing system is much more accurate than 
the critical bands. 

The above phenomenon is a condition to 
allow masking of a soft sound by a loud 
sound which occurs at a nearby frequency 
and/or instant. The spectral masking range of 
an individual loud sound is shown in Fig- 
ure 1. The masking (or, if you like, obliterat- 
ing) sound may occur simultaneously with 
the soft sound or even after it. 

The actual effect of the masking depends 
not only on the spectral and time-related 
arrangement of the masking and masked 
sound, but also on its tonality. Noise is far 
easier to mask than a discrete frequency. 
Conversely, a discrete frequency is a much 
better mask than noise. 


The starting point for a psycho-acoustic 
model is the frequency-dependent charac- 
teristic of the human hearing system. The 
curves in Figure 2 illustrate that sounds 
within the frequency range of human 
speech are perceived with greater resolu- 
tion than very high or very low sounds. The 
lower, dashed, curve represent the absolute 
hearing threshold, below which no sounds 
are perceived by most of us. A masking 
operation is applied to raise the absolute 
threshold to the so-called co-hearing 
threshold, as it appears three times as 
dashed lines in Figure 3 for three different 
masking signals. 

A psycho-acoustic model analyses the 
audio signal and employs complex algorithms 
to compute the usability of masking sounds 
in the relevant frequency range. The closer 
the model gets to reality, the higher the com- 
pression rate that can be achieved at a given 
quality level of the output signal. However, 
the rules of the model are ‘relaxed’ to an 
extent required by the transmission rate and 


22 


100 


L (dB) —> 


50 


20 100 


f (Hz) —> 





1k 10k 20k 
000044 - 13 


Figure 3. Three masks are applied to raise the co-hearing threshold. 


the signal quality. In cases where no 
compression is required, this is even 
switched off completely. 


Functional blocks 


The elementary architecture of an 
MPEG audio encoder is illustrated 
in Figure 4. Decompression is the 
reverse operation or encoding. To be 
able to apply the psycho-acoustic 
model to the digital audio input sig- 
nal (a PCM datastream arriving at 
768 kBit/s), the signal first has to be 
transposed to the frequency 
domain. A fast Fourier transforma- 
tion with 1024 coefficients is 
employed as part of the computa- 
tion of the psycho-acoustic model. 
Fully synchronous with this 


Audio Signal 
(PCM 768 kbit/s) 
Filter Bank 
(32 sub-bands) 





559 


560 5 
MDCT | | 
576 






acoustic 
Model 







Non-linear 
Quantisation 


To 


process, a filter bank divides the 
audible spectrum into 32 sub-bands 
of equal width (750 Hz at a sam- 
pling rate of 48 kHz). Next, 32-times 
sub-sampling is applied to each 
sub-band of the input signal, result- 
ing in 32 PCM input data in a sub- 
band sample. The filter bands equal 
the previously mentioned critical 
frequency bands of the human hear- 
ing system, with three differences. 
Firstly, the constant width of the 
sub-bands is unlike that of the 
human example, secondly, the filter 
bank and its complementary func- 
tion in the decoder are not loss-free, 
and thirdly, an individual frequency 
may affect the output signals 
because of a certain overlap of the 
sub-bands. Fortunately, the errors 


Huffmann 
Encoding 


Bitstream- Encoded 
formatting Bitstream 


and an 
Bitrate Control Error © 


Correction 





Aux. Info 
Encoding 


000044 - 14 


Figure 4. Block diagram of an encoder to MPEG 2.5 Layer Ill. 


Elektor Electronics 5/2000 













CD-ROM Audio Signal 


Ts 
PIC p 
16F874 j MAS3507 


Audio 
Keyboard © » 
right 





DAC3550 





000044 - 15 


Figure 5. MP3 CD-player based on the MAS3507D decoder and DAC3550 
digital-to-analogue converter, both from Micronas Intermetall. The setup is 


controlled by a PIC | 7C/56. 
Audio 
CD-ROM 
om [+S 


LCD ATAPI 
Mic. 
DAC3550 © C 


KS17C4000 


Keyboard 


Smart/media 
Card 


Figure 6. MP3 CD player using the Samsung TL723 IMD decoder. Besides 


Audio 
right 


000044 - 16 


decoding MP3 data this chip also handles ADPCM-based audio signal compression. 











CD-ROM 
ISO9660 


> 
gO5 
~~ 5 


12C-Bus 







PIC 
170756 










STA013 CS4331 


Audio 
Keyboard C) » 
right 


000044 - 17 


Figure 7. MP3 CD player based on the STAO!3 decoder from ST Microelectronics 
and the CS433 1 DAC from Crystal Semiconductor. 


5/2000 Elektor Electronics 


introduced by the filter bank are small 
enough to be inaudible (<0.07 dB). 

This is mainly caused by the final MDCT 
(Modified Discrete Cosine Transformation) 
which divides each of these 32 sub-bands 
into 18 sub-sub-bands. In the end we have 
32x18 = 576 sub-bands of 42 Hz each (ata 
sampling rate of 48 kHz). 

In the next operation, noise allocation, the 
ratio of the quantisation noise and the co- 
hearing threshold is recovered from the dif- 
ference of the signal-to-noise ratio (filter 
bank) and the signal-to-mask ratio of the psy- 
cho-acoustic model. The number of bits avail- 
able in the output datastream is then deter- 
mined (depending on the selected overall 
data rate minus data on scaling factor, head- 
ers and other auxiliary data). The encoder 
varies the quantisation in a certain sequence, 
weighs the spectral values, counts the number 
of bits required for the output signal (the 
number being further reduced by Huffmann 
encoding). The outcome is used to compute 
the permissible quantisation noise level. If, 
after quantisation, bands are found with 
unacceptably high distortion, the encoder 
amplifies these bands and so raises the size 
of their quantisation stages. The operation is 
repeated until distortion levels are below the 
acceptable level. MP3 works with a variable 
transmission rate and so creates a kind of 
‘buffer’ for the duration of signals that require 
a few bits only. The buffer is employed for 
‘difficult’ encoding operations, at which the 
data rate is fully exhausted. 

The block diagram of the encoder is com- 
pleted with a formatting unit that serves to 
add the auxiliary data, and pack the lot into 
frames ready for sending to the decoder. 


From chipset to MP3 player 


Different chipsets are currently available for 
the purpose of decoding MP3 signals. In this 
article, three options are discussed for realiz- 
ing a stand-alone CD player for MP3 and 
audio data, based on three different chip sets 
from different manufacturers. The data source 
is invariably a standard ATAPI-compatible 
CD-ROM drive. A microcontroller is used to 
look after the ATAPI protocol and supply the 
data for the MP3 decoder. The same controller 
also drives an LCD, and scans keys for navi- 
gation within the CD directory structure. A 
digital-to-analogue converter converts 
decompressed data supplied by the MP3 
decoder into plain audio signals. 

At the time of writing, MP3 chipsets are 
offered by three companies. Figure 5 shows 
an MP3 player based on a chipset supplied 
by Micronas. This player is controlled by a 
PIC17C756 which reads the MP3 data from a 


23 


GENERALINICRES| >= S 


CD-ROM drive and transmits them serially to 
a decoder chip type MAS3507D. You can 
browse the CD contents and navigate by 
means of a keyboard and a readout (LCD). 
The display also indicates the ID-3 tag (data 
information). Decoded data are applied to a 
digital-to-analogue converter (DAC) type 
DAC3550, which supplies the analogue audio 
signals. After cleaning in a third-order low- 
pass filter, the audio signals are allowed to 
leave the decoder. 

Besides a digital input, the converter also 
sports an analogue input. An I2C command is 
used to select between these inputs. If the 
microcontroller detects a music CD in the CD- 
ROM drive, the DAC is ‘ordered’ to switch to 
its analogue input. 

Another chipset is shown in Figure 6. 
Here, the Samsung TL7231 is used as the 
MP3 decoder. Because this chip integrates a 
digital-to-analogue converter, the number of 
components is drastically reduced. In addi- 
tion to the MP3 decoder function, the 
TL7231MD also has an ADPCM Codec. Using 
this chip it is possible to store audio signals 
in compressed form, and play them back 
again. 

If PCB space is at a premium, the solution 
is to employ the STA013 from STMicroelec- 
tronics (Figure 7). This chip comes in a SO-28 
case. The CS84331 digital-to-analogue con- 
verter from Crystal Semiconductor comes in 
a SO-8 enclosure. It should be noted that this 
MP3 decoder wants a configuration file after 
every reset. This file is supplied by the ST7 
central controller in the system. 

(000044-1) 


Note: 

The construction of a Stand-Alone MP3 Player will 
be discussed in the July/August 2000 issue of Elek- 
tor Electronics. 


Literature 

- Micronas Intermetall: MAS 3507D MPEG 1/2 
Layer 2/3 Audio Decoder. Preliminary Data 
Sheet, Edition Oct. 21, 1998. 


- Micronas Intermetall: DAC 3550 Stereo Audio 
DAC. Preliminary Data Sheet, Edition April 23, 
1999. 


- Samsung Semiconductor: TL7231MD Full Layer- 
Il ISO/IEC I 1172-3 Audio Decoder. Data Sheet. 


- STMicroelectronics: STAO!3/STAOI3T MPEG 


2.5 Layer Ill Audio Decoder. Data Sheet, Sep- 
tember 1999. 


24 


Compression and the future 


A number of other audio compression methods exist besides MPEG Layer III. 
These are partly well established, partly ‘being designed’. At least for the near 
future MP3 seems to have won the battle for dominance on the Internet. How- 
ever, Sony’s AC-3 system has pushed MPEG aside in the DVD market. 


MPEG-2 

allows ‘low’ sampling frequencies like 16 kHz, 22.05 kHz and 24 kHz besides the 
more usual 32 kHz, 44.1 kHz and 48 kHz standards. It also supports 5.1 -Sur- 
round as well as multi-language broadcasts. Otherwise downwards compatible 
with MPEG-1. 


MPEG-2.5 

again halves the sampling rates, allowing bit rates up to 8 kbit/s to be achieved. 
Otherwise this standard is downwards compatible with MPEG-I. The Micronas 
chipset used in the Stand-Alone MP3 Player is designed for MPEG-2.5. 


MPEG-2-AAC (Advanced Audio Coding) 

is not downwards compatible. It allows sampling frequencies of 32 kHz, 44.1 kHz 
and 48 kHz as well as single-channel, dual-channel 5.1-Surround and multi-lan- 
guage broadcasts. Provision is made for error detection/correction. AAC is 
increasingly used by broadcast stations and, interestingly, for a new satellite radio 
station covering the southern hemisphere. In Japan, HDTV and all broadcast sys- 
tems are to employ AAC. 


MPEG-4 

allows acoustic events in a room to be described. This is of particular interest to 
multimedia applications because a ‘speed control’ enables different media to be 
synchronized. 


ATRAC 

is used in MiniDisc systems and supplies a data rate of 140 kbit/s/channel. The 
main difference with MPEG-| is that the filter bank uses a different method to 
resolve the spectrum. In higher frequency bands, the frequency resolution is 
reduced and the time-resolution is increased. (the ‘higher’ filter sections operate 
faster). 


Dolby AC-3 

is MPEC’s biggest competitor and extensively applied with DVD and HDTV. AC-3 
supports various formats including 5.1-Surround. The bitrate may lie between 

32 kbit/s and 640 kbit/s. The quality is about the same as MPEG-2, however it is 
below that of AAC with 5.1-Surround. 


MS Audio 
Impossible compression system devised by Microsoft. 


- STMicroelectronics: AN 1090 STAOI3 
MPEG 2.5 Layer III Source Decoder. 
Application Note, March 1999. 


- Draft ISO/IEC 9660: Information tech- 
nology — Volume and file structure of 
CD-ROM for information interchange. 


- A Tutorial on MPEG/Audio Compres- 
sion, published in IEEE Multimedia 
Journal, Summer 1995 


Elektor Electronics 5/2000 


